42 research outputs found

    A Compression-Based Toolkit for Modelling and Processing Natural Language Text

    Get PDF
    A novel compression-based toolkit for modelling and processing natural language text is described. The design of the toolkit adopts an encoding perspective—applications are considered to be problems in searching for the best encoding of different transformations of the source text into the target text. This paper describes a two phase ‘noiseless channel model’ architecture that underpins the toolkit which models the text processing as a lossless communication down a noise-free channel. The transformation and encoding that is performed in the first phase must be both lossless and reversible. The role of the verification and decoding second phase is to verify the correctness of the communication of the target text that is produced by the application. This paper argues that this encoding approach has several advantages over the decoding approach of the standard noisy channel model. The concepts abstracted by the toolkit’s design are explained together with details of the library calls. The pseudo-code for a number of algorithms is also described for the applications that the toolkit implements including encoding, decoding, classification, training (model building), parallel sentence alignment, word segmentation and language segmentation. Some experimental results, implementation details, memory usage and execution speeds are also discussed for these applications

    The use of frames in knowledge-based systems : a thesis submitted in partial fulfilment of the requirements for the degree of Master of Science in Computer Science at Massey University

    Get PDF
    The general aim of this study was to investigate the use of frames as a means of representing knowledge in computer knowledge-based systems. This thesis examines the application of frames to two particular situations, the playing of an opening bid in Bridge, and the recognition of birds from field observations. The Frame Representation Language FRL was used in the implementation of the two different systems. Three aspects of frames are investigated the problems of matching two different frames; the problems of structuring frame systems for searching; and the problem of improving the interface between the frame system and the user of the knowldege base. A comparison is also made of frames with other methods of knowledge representation such as production systems and semantic networks. Finally, further areas of research into the use of frames are suggested such as the extension of frame matching, research into the aspects of knowledge representation and application of frames to specific problems

    Using Compression to Find Interesting One Dimensional Cellular Automata

    Get PDF

    Automatic Correction of Arabic Dyslexic Text

    Get PDF
    This paper proposes an automatic correction system that detects and corrects dyslexic errors in Arabic text. The system uses a language model based on the Prediction by Partial Matching (PPM) text compression scheme that generates possible alternatives for each misspelled word. Furthermore, the generated candidate list is based on edit operations (insertion, deletion, substitution and transposition), and the correct alternative for each misspelled word is chosen on the basis of the compression codelength of the trigram. The system is compared with widely-used Arabic word processing software and the Farasa tool. The system provided good results compared with the other tools, with a recall of 43%, precision 89%, F1 58% and accuracy 81%

    Alpha Multipliers Breadth-First Search Technique for Resource Discovery in Unstructured Peer-to-Peer Networks

    Get PDF
    Resource discovery in unstructured peer-to-peer (P2P) networks is important in the field of grid computing. Breadth-first search (BFS) is widely used for resource discovery in unstructured P2P networks. The technique is proven to return as many search results as possible. However, the network cost of the technique is high due to the flooding of query messages that can degenerate the performance of the whole network. The objective of this study is to optimise the BFS technique, so that it will produce good search results without flooding the network with unnecessary walkers. Several resource discovery techniques used in unstructured P2P networks are discussed and categorised. P2P simulators that are used for P2P network experiments were studied in accordance to their characteristics such as, scalability, extensibility and support status. Several network topology generators were also scrutinised and selected in order to find out the most real-life like network generation model for unstructured P2P experiments. Multiple combinations of five-tuple alpha multipliers have been experimented to find out the best set to make -BFS. In our test, the -BFS increases the query efficiency of the conventional BFS from 55.67% to 63.15%

    A new hybrid metric for verifying parallel corpora of Arabic-English

    Get PDF
    This paper discusses a new metric that has been applied to verify the quality in translation between sentence pairs in parallel corpora of Arabic-English. This metric combines two techniques, one based on sentence length and the other based on compression code length. Experiments on sample test parallel Arabic-English corpora indicate the combination of these two techniques improves accuracy of the identification of satisfactory and unsatisfactory sentence pairs compared to sentence length and compression code length alone. The new method proposed in this research is effective at filtering noise and reducing mis-translations resulting in greatly improved quality.Comment: in CCSEA-201
    corecore